14 research outputs found

    Imbalanced Ensemble Classifier for learning from imbalanced business school data set

    Full text link
    Private business schools in India face a common problem of selecting quality students for their MBA programs to achieve the desired placement percentage. Generally, such data sets are biased towards one class, i.e., imbalanced in nature. And learning from the imbalanced dataset is a difficult proposition. This paper proposes an imbalanced ensemble classifier which can handle the imbalanced nature of the dataset and achieves higher accuracy in case of the feature selection (selection of important characteristics of students) cum classification problem (prediction of placements based on the students' characteristics) for Indian business school dataset. The optimal value of an important model parameter is found. Numerical evidence is also provided using Indian business school dataset to assess the outstanding performance of the proposed classifier

    A Nonparametric Ensemble Binary Classifier and its Statistical Properties

    Full text link
    In this work, we propose an ensemble of classification trees (CT) and artificial neural networks (ANN). Several statistical properties including universal consistency and upper bound of an important parameter of the proposed classifier are shown. Numerical evidence is also provided using various real life data sets to assess the performance of the model. Our proposed nonparametric ensemble classifier doesn't suffer from the `curse of dimensionality' and can be used in a wide variety of feature selection cum classification problems. Performance of the proposed model is quite better when compared to many other state-of-the-art models used for similar situations

    Real-time forecasts and risk assessment of novel coronavirus (COVID-19) cases: A data-driven analysis

    Full text link
    The coronavirus disease 2019 (COVID-19) has become a public health emergency of international concern affecting 201 countries and territories around the globe. As of April 4, 2020, it has caused a pandemic outbreak with more than 11,16,643 confirmed infections and more than 59,170 reported deaths worldwide. The main focus of this paper is two-fold: (a) generating short term (real-time) forecasts of the future COVID-19 cases for multiple countries; (b) risk assessment (in terms of case fatality rate) of the novel COVID-19 for some profoundly affected countries by finding various important demographic characteristics of the countries along with some disease characteristics. To solve the first problem, we presented a hybrid approach based on autoregressive integrated moving average model and Wavelet-based forecasting model that can generate short-term (ten days ahead) forecasts of the number of daily confirmed cases for Canada, France, India, South Korea, and the UK. The predictions of the future outbreak for different countries will be useful for the effective allocation of health care resources and will act as an early-warning system for government policymakers. In the second problem, we applied an optimal regression tree algorithm to find essential causal variables that significantly affect the case fatality rates for different countries. This data-driven analysis will necessarily provide deep insights into the study of early risk assessments for 50 immensely affected countries

    Bayesian Neural Tree Models for Nonparametric Regression

    Full text link
    Frequentist and Bayesian methods differ in many aspects, but share some basic optimal properties. In real-life classification and regression problems, situations exist in which a model based on one of the methods is preferable based on some subjective criterion. Nonparametric classification and regression techniques, such as decision trees and neural networks, have frequentist (classification and regression trees (CART) and artificial neural networks) as well as Bayesian (Bayesian CART and Bayesian neural networks) approaches to learning from data. In this work, we present two hybrid models combining the Bayesian and frequentist versions of CART and neural networks, which we call the Bayesian neural tree (BNT) models. Both models exploit the architecture of decision trees and have lesser number of parameters to tune than advanced neural networks. Such models can simultaneously perform feature selection and prediction, are highly flexible, and generalize well in settings with a limited number of training observations. We study the consistency of the proposed models, and derive the optimal value of an important model parameter. We also provide illustrative examples using a wide variety of real-life regression data sets

    A novel distribution-free hybrid regression model for manufacturing process efficiency improvement

    Full text link
    This work is motivated by a particular problem of a modern paper manufacturing industry, in which maximum efficiency of the fiber-filler recovery process is desired. A lot of unwanted materials along with valuable fibers and fillers come out as a by-product of the paper manufacturing process and mostly goes as waste. The job of an efficient Krofta supracell is to separate the unwanted materials from the valuable ones so that fibers and fillers can be collected from the waste materials and reused in the manufacturing process. The efficiency of Krofta depends on several crucial process parameters and monitoring them is a difficult proposition. To solve this problem, we propose a novel hybridization of regression trees (RT) and artificial neural networks (ANN), hybrid RT-ANN model, to solve the problem of low recovery percentage of the supracell. This model is used to achieve the goal of improving supracell efficiency, viz., gain in percentage recovery. In addition, theoretical results for the universal consistency of the proposed model are given with the optimal value of a vital model parameter. Experimental findings show that the proposed hybrid RT-ANN model achieves higher accuracy in predicting Krofta recovery percentage than other conventional regression models for solving the Krofta efficiency problem. This work will help the paper manufacturing company to become environmentally friendly with minimal ecological damage and improved waste recovery

    An Interpretable Probabilistic Autoregressive Neural Network Model for Time Series Forecasting

    Full text link
    Forecasting time series data presents an emerging field of data science that has its application ranging from stock price and exchange rate prediction to the early prediction of epidemics. Numerous statistical and machine learning methods have been proposed in the last five decades with the demand for generating high-quality and reliable forecasts. However, in real-life prediction problems, situations exist in which a model based on one of the above paradigms is preferable, and therefore, hybrid solutions are needed to bridge the gap between classical forecasting methods and scalable neural network models. We introduce an interpretable probabilistic autoregressive neural network model for an explainable, scalable, and "white box-like" framework that can handle a wide variety of irregular time series data (e.g., nonlinearity and nonstationarity). Sufficient conditions for asymptotic stationarity and geometric ergodicity are obtained by considering the asymptotic behavior of the associated Markov chain. During computational experiments, PARNN outperforms standard statistical, machine learning, and deep learning models on a diverse collection of real-world datasets coming from economics, finance, and epidemiology, to mention a few. Furthermore, the proposed PARNN model improves forecast accuracy significantly for 10 out of 12 datasets compared to state-of-the-art models for short to long-term forecasts

    Prediction of Transportation Index for Urban Patterns in Small and Medium-sized Indian Cities using Hybrid RidgeGAN Model

    Full text link
    The rapid urbanization trend in most developing countries including India is creating a plethora of civic concerns such as loss of green space, degradation of environmental health, clean water availability, air pollution, traffic congestion leading to delays in vehicular transportation, etc. Transportation and network modeling through transportation indices have been widely used to understand transportation problems in the recent past. This necessitates predicting transportation indices to facilitate sustainable urban planning and traffic management. Recent advancements in deep learning research, in particular, Generative Adversarial Networks (GANs), and their modifications in spatial data analysis such as CityGAN, Conditional GAN, and MetroGAN have enabled urban planners to simulate hyper-realistic urban patterns. These synthetic urban universes mimic global urban patterns and evaluating their landscape structures through spatial pattern analysis can aid in comprehending landscape dynamics, thereby enhancing sustainable urban planning. This research addresses several challenges in predicting the urban transportation index for small and medium-sized Indian cities. A hybrid framework based on Kernel Ridge Regression (KRR) and CityGAN is introduced to predict transportation index using spatial indicators of human settlement patterns. This paper establishes a relationship between the transportation index and human settlement indicators and models it using KRR for the selected 503 Indian cities. The proposed hybrid pipeline, we call it RidgeGAN model, can evaluate the sustainability of urban sprawl associated with infrastructure development and transportation systems in sprawling cities. Experimental results show that the two-step pipeline approach outperforms existing benchmarks based on spatial and statistical measures

    Epicasting: An Ensemble Wavelet Neural Network (EWNet) for Forecasting Epidemics

    Full text link
    Infectious diseases remain among the top contributors to human illness and death worldwide, among which many diseases produce epidemic waves of infection. The unavailability of specific drugs and ready-to-use vaccines to prevent most of these epidemics makes the situation worse. These force public health officials and policymakers to rely on early warning systems generated by reliable and accurate forecasts of epidemics. Accurate forecasts of epidemics can assist stakeholders in tailoring countermeasures, such as vaccination campaigns, staff scheduling, and resource allocation, to the situation at hand, which could translate to reductions in the impact of a disease. Unfortunately, most of these past epidemics exhibit nonlinear and non-stationary characteristics due to their spreading fluctuations based on seasonal-dependent variability and the nature of these epidemics. We analyse a wide variety of epidemic time series datasets using a maximal overlap discrete wavelet transform (MODWT) based autoregressive neural network and call it EWNet model. MODWT techniques effectively characterize non-stationary behavior and seasonal dependencies in the epidemic time series and improve the nonlinear forecasting scheme of the autoregressive neural network in the proposed ensemble wavelet network framework. From a nonlinear time series viewpoint, we explore the asymptotic stationarity of the proposed EWNet model to show the asymptotic behavior of the associated Markov Chain. We also theoretically investigate the effect of learning stability and the choice of hidden neurons in the proposal. From a practical perspective, we compare our proposed EWNet framework with several statistical, machine learning, and deep learning models. Experimental results show that the proposed EWNet is highly competitive compared to the state-of-the-art epidemic forecasting methods

    Semiparametric Survival Analysis of 30-Day Hospital Readmissions with Bayesian Additive Regression Kernel Model

    No full text
    In this paper, we introduce a kernel-based nonlinear Bayesian model for a right-censored survival outcome data set. Our kernel-based approach provides a flexible nonparametric modeling framework to explore nonlinear relationships between predictors with right-censored survival outcome data. Our proposed kernel-based model is shown to provide excellent predictive performance via several simulation studies and real-life examples. Unplanned hospital readmissions greatly impair patients’ quality of life and have imposed a significant economic burden on American society. In this paper, we focus our application on predicting 30-day readmissions of patients. Our survival Bayesian additive regression kernel model (survival BARK or sBARK) improves the timeliness of readmission preventive intervention through a data-driven approach

    Semiparametric Survival Analysis of 30-Day Hospital Readmissions with Bayesian Additive Regression Kernel Model

    No full text
    In this paper, we introduce a kernel-based nonlinear Bayesian model for a right-censored survival outcome data set. Our kernel-based approach provides a flexible nonparametric modeling framework to explore nonlinear relationships between predictors with right-censored survival outcome data. Our proposed kernel-based model is shown to provide excellent predictive performance via several simulation studies and real-life examples. Unplanned hospital readmissions greatly impair patients’ quality of life and have imposed a significant economic burden on American society. In this paper, we focus our application on predicting 30-day readmissions of patients. Our survival Bayesian additive regression kernel model (survival BARK or sBARK) improves the timeliness of readmission preventive intervention through a data-driven approach
    corecore